Make a highlighted time series

Time series comparisons can be handy for seeing trends in data. Here, we explore how to look at some stuff about ducks

Eukaryota
Animalia
Aves
Summaries
Authors

Thai Rushbrook

Olivia Torresan

Dax Kellie

Published

March 7, 2022

Author

Thai Rushbrook
Olivia Torresan
Dax Kellie

Date

7 March 2023

Thanks to huge efforts by citizen scientists, a majority of species observations in the Atlas of Living Australia are collected opportunistically, where people record observations incidentally rather than through a recurring monitoring program.

However, these observations are susceptible to things unrelated to the species itself. It might be rainy, it might be too hot, an area might be inaccessible; all of these factors can affect whether people record an observation or not.

The COVID-19 pandemic had a major impact on people’s health, behaviour and travel. In Australia, several shorter lockdowns over 2020-2021 imposed restrictions on people’s movements. Melbourne experienced the longest continuous lockdown in the world. These restrictions limited people to either stay at home or choose activities (mainly exercise) that they could do in their local area. To what extent did COVID-19 affect the number of species observations people made over that time?

Here, we’ll use a highlighted time-series plot to investigate how lockdowns affected the data collection of Anatidae observations (ducks, geese and swans), a group frequently recorded on walks and outdoor gatherings, in Melbourne compared to previous years.

Get data

We’ll start by downloading Anatidae records.

First, let’s load some packages:

# Load packages
library(galah)
library(tidyverse)
library(lubridate)
library(grid)
library(ggplot2)
library(pilot) # remotes::install_github("olihawkins/pilot")
library(showtext)

Let’s use the {galah} package to download duck records in Melbourne from 2017-2021 to get records from years just before and during COVID-19 lockdowns.

Searching with galah::search_fields() shows us that {galah} contains Greater Capital City Statistical Areas, which we can use to filter our query.

search_all(fields, "city")
# A tibble: 1 × 4
  id      description                                                type  link 
  <chr>   <chr>                                                      <chr> <chr>
1 cl10929 PSMA ABS Greater Capital City Statistical Areas (2016) AB… laye… http…
search_all(fields, "city") |> search_values("melbourne")
• Showing values for 'cl10929'.
# A tibble: 1 × 2
  field   category         
  <chr>   <chr>            
1 cl10929 GREATER MELBOURNE

Let’s build our query to return Anatidae records from GREATER MELBOURNE from 2017 to 2021. We’ll use galah_select() to return only the eventDate column.

You will need to first provide a registered email with the ALA using galah_config() before retrieving records.

# Add registered email (register at ala.org.au)
galah_config(email = "your-email@email.com")
ducks <-
  galah_call() |>
  galah_identify("Anatidae") |>
  galah_filter(
    cl10929 == "GREATER MELBOURNE",
    # year >= 2017,
    # year <= 2021,
    eventDate >= "2017-01-01T00:00:00Z",
    eventDate <= "2021-12-31T23:59:00Z",
    basisOfRecord == "HUMAN_OBSERVATION"
  ) |>
  galah_select(eventDate) |>
  atlas_occurrences()

ducks |> head(6L)
# A tibble: 6 × 1
  eventDate          
  <dttm>             
1 2017-01-01 13:00:00
2 2017-01-01 13:00:00
3 2017-01-01 13:00:00
4 2017-01-01 13:00:00
5 2017-01-01 13:00:00
6 2017-01-01 13:00:00

We’ll pull out some week and year of each date and count the total occurrence records for each week.

ducks_weekly <- ducks |> 
  mutate(date = as_date(eventDate),
         year = year(eventDate),
         week = week(eventDate)) |>
  group_by(year, week) |>
  summarise(week_count = n())
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
ducks_weekly 
# A tibble: 265 × 3
# Groups:   year [5]
    year  week week_count
   <dbl> <dbl>      <int>
 1  2017     1        647
 2  2017     2        670
 3  2017     3        665
 4  2017     4        790
 5  2017     5        745
 6  2017     6        580
 7  2017     7        658
 8  2017     8        680
 9  2017     9        575
10  2017    10        541
# … with 255 more rows

We are intending to compare record counts in 2020-2021 to previous years. However, because record numbers added to the ALA have grown each year, it’s likely that comparing raw numbers will show a biased view that there were more records than previous years.

To avoid this, let’s scale our record counts by the total number of Anatidae records each year so that we compare proportions rather than raw numbers.

First let’s download the total Anatidae records for each year.

ducks_yearly <- 
  galah_call() |>    
  galah_identify("Anatidae") |> 
  galah_filter(cl10929 == "GREATER MELBOURNE", 
               year >= 2017, year <= 2021) |> 
  galah_group_by(year) |>
  atlas_counts() |>
  rename(year_total = count) |>
  mutate(year = as.numeric(year)) |>
  arrange(-desc(year))
  
ducks_yearly
# A tibble: 5 × 2
   year year_total
  <dbl>      <int>
1  2017      37194
2  2018      48696
3  2019      47874
4  2020      48850
5  2021      57520

Now we’ll split ducks_weekly into 5 data.frames for each year and divide the record count by the total for each year. At the end we’ll bind the separate data.frames into one again.

ducks_scaled <- ducks_weekly %>%
  split(.$year) %>%
  map2(.x = .,
       .y = ducks_yearly$year_total, 
       ~ .x %>%
         mutate(prop = .x$week_count / .y * 100) %>%
         select(-week_count)
  ) %>%
  bind_rows()

ducks_scaled
# A tibble: 265 × 3
# Groups:   year [5]
    year  week  prop
   <dbl> <dbl> <dbl>
 1  2017     1  1.74
 2  2017     2  1.80
 3  2017     3  1.79
 4  2017     4  2.12
 5  2017     5  2.00
 6  2017     6  1.56
 7  2017     7  1.77
 8  2017     8  1.83
 9  2017     9  1.55
10  2017    10  1.45
# … with 255 more rows

Finally, we’ll place our weekly proportions in separate columns with pivot_wider().

In our final plot we want to compare the “normal” number of records collected to those in 2020 and 2021, and it would be nice to represent the “normal” number as one line representing the average in “normal” years. With that goal in mind, we’ll also calculate the mean proportion of records each week from 2017-2019.

ducks_scaled <- ducks_scaled %>%
  pivot_wider(names_from = year, 
              values_from = prop, 
              names_sort = TRUE,
              names_glue = "year_{year}") |>
  rowwise() |>
  mutate(mean_2017_19 = mean(c_across(year_2017:year_2019)))

ducks_scaled
# A tibble: 53 × 7
# Rowwise: 
    week year_2017 year_2018 year_2019 year_2020 year_2021 mean_2017_19
   <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>        <dbl>
 1     1      1.74      1.74      3.22      2.83      2.18         2.23
 2     2      1.80      1.89      2.95      1.91      1.86         2.21
 3     3      1.79      1.56      2.65      2.51      2.02         2.00
 4     4      2.12      1.53      2.30      2.43      1.75         1.99
 5     5      2.00      1.62      1.85      1.59      2.03         1.83
 6     6      1.56      1.34      1.78      1.75      2.05         1.56
 7     7      1.77      1.77      1.87      1.47      1.57         1.80
 8     8      1.83      1.37      1.98      1.54      1.90         1.73
 9     9      1.55      1.45      1.33      1.28      1.84         1.44
10    10      1.45      1.25      2.12      1.17      1.67         1.61
# … with 43 more rows

Now we have our dataset, it needs a bit of reorganising to make it suitable for plotting. Counts are grouped by week of the year, meaning we have 2 sets of weeks 1-52 (one for 2020, one for 2021). To plot these in order along an axis we need to convert this to 1-53 for 2020 (a leap year, extra week), then 54-106 for 2021.

# 2021 record count proportions
ducks_scaled_2021 <- ducks_scaled |>
  select(week, 
         prop = year_2021, 
         mean_2017_19) |>
  mutate(week = 53 + week)

ducks_scaled_2021 |> head(5L)
# A tibble: 5 × 3
# Rowwise: 
   week  prop mean_2017_19
  <dbl> <dbl>        <dbl>
1    54  2.18         2.23
2    55  1.86         2.21
3    56  2.02         2.00
4    57  1.75         1.99
5    58  2.03         1.83
# 2020 + 2021 record count proportions
ducks_final <- ducks_scaled |>
  select(week, 
         prop = year_2020, 
         mean_2017_19) |>
  bind_rows(ducks_scaled_2021)

ducks_final
# A tibble: 106 × 3
# Rowwise: 
    week  prop mean_2017_19
   <dbl> <dbl>        <dbl>
 1     1  2.83         2.23
 2     2  1.91         2.21
 3     3  2.51         2.00
 4     4  2.43         1.99
 5     5  1.59         1.83
 6     6  1.75         1.56
 7     7  1.47         1.80
 8     8  1.54         1.73
 9     9  1.28         1.44
10    10  1.17         1.61
# … with 96 more rows

Lockdowns

During the height of the pandemic, Melbourne had 6 distinct lockdowns. Let’s add their start and end dates to a table.

n_lockdown <- c(1:6)
start_date <- c("2020-03-31", "2020-07-09",
                "2021-02-13", "2021-05-28",
                "2021-07-16", "2021-08-05")
end_date <- c("2020-05-12", "2020-10-27",
              "2021-02-17", "2021-06-10",
              "2021-07-27", "2021-10-21")

lockdowns <- tibble(n_lockdown, start_date, end_date) |>
  mutate(
    n_days = as_date(ymd(end_date)) - as_date(ymd(start_date)),
    weekstart = week(start_date),
    weekend = week(end_date))

lockdowns 
# A tibble: 6 × 6
  n_lockdown start_date end_date   n_days   weekstart weekend
       <int> <chr>      <chr>      <drtn>       <dbl>   <dbl>
1          1 2020-03-31 2020-05-12  42 days        13      19
2          2 2020-07-09 2020-10-27 110 days        28      43
3          3 2021-02-13 2021-02-17   4 days         7       7
4          4 2021-05-28 2021-06-10  13 days        22      23
5          5 2021-07-16 2021-07-27  11 days        29      30
6          6 2021-08-05 2021-10-21  77 days        31      42
Expand for session info
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.2 (2022-10-31 ucrt)
 os       Windows 10 x64 (build 19044)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_Australia.utf8
 ctype    English_Australia.utf8
 tz       Australia/Sydney
 date     2023-03-10
 pandoc   2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 dplyr       * 1.1.0   2023-01-29 [1] CRAN (R 4.2.2)
 forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.2.2)
 galah       * 1.5.1   2023-02-21 [1] Github (AtlasOfLivingAustralia/galah@bd43dd2)
 ggplot2     * 3.4.1   2023-02-10 [1] CRAN (R 4.2.2)
 htmltools   * 0.5.4   2022-12-07 [1] CRAN (R 4.2.2)
 lubridate   * 1.9.2   2023-02-10 [1] CRAN (R 4.2.2)
 pilot       * 4.0.0   2022-07-13 [1] Github (olihawkins/pilot@f08cc16)
 purrr       * 1.0.1   2023-01-10 [1] CRAN (R 4.2.2)
 readr       * 2.1.4   2023-02-10 [1] CRAN (R 4.2.2)
 sessioninfo * 1.2.2   2021-12-06 [1] CRAN (R 4.2.1)
 showtext    * 0.9-5   2022-02-09 [1] CRAN (R 4.2.1)
 showtextdb  * 3.0     2020-06-04 [1] CRAN (R 4.2.1)
 stringr     * 1.5.0   2022-12-02 [1] CRAN (R 4.2.2)
 sysfonts    * 0.8.8   2022-03-13 [1] CRAN (R 4.2.1)
 tibble      * 3.1.8   2022-07-22 [1] CRAN (R 4.2.1)
 tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.2.2)
 tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.2.2)

 [1] C:/Users/KEL329/R-packages
 [2] C:/Users/KEL329/AppData/Local/Programs/R/R-4.2.2/library

──────────────────────────────────────────────────────────────────────────────